Neural Networks

Evaluation functions are often difficult to design.

Arthur Samuel, in his pioneering work in machine learning, attempted to have his program discover through search its own checker board evaluation function [Samuel, 1959]. His work has proven difficult to repeat, and today most game board evaluation functions such as those in world chess champion Deep Blue [Hamilton and Garber, 1997] and world checkers champion Chinook [Schaeffer, 1997] are still primarily designed by hand.

Gerald Tesauro, in his TD-Gammon backgammon program, trained a neural network to do game board evaluations [Tesauro, 1995]. His move generator generates all possible moves from a given die roll, and passes each resulting board position to the neural net for evaluation. He trained the neural net by having it play against itself for several days, modifying the neural net weights using the temporal difference learning technique (discussed below). So this system employed search both during development, to find the best set of weights for the neural net, and during gameplay, to find the best move.

Neural Network Basics

Neural networks consist of an input layer, zero or more hidden layers, and an output layer. The output of each neuron is computed by a weighed sum of the inputs to the neuron. A non-linear function such as the sigmoid function is also applied to the output. Training the network consists of finding the set of weights for each neuron input that cause it to produce the desired outputs when the input data set is presented.

Neural network training is also a form of gradient-descent search. The network is a function approximation system which tries to produce the desired output for each training input. The objective of training is to minimize the error across the set of all training inputs. This is the same as finding the minima on a surface, where the altitude is error and the other dimensions are values for the network weights.

TD-Gammon

The TD-Gammon network used 198 input units, 40 hidden units, and 4 output units. The inputs encoded the positions of the pieces on the board. The outputs represented the estimated probability of each player winning from this position.

Temporal Difference Learning

At the beginning of training weights in neural networks are initialized to random values. As each test input is presented the generated output is compared to the desired output, and the weights are adjusted accordingly. The technique used to assign weight changes across both output and hidden layers is called back-propagation. This technique requires a known correct output to compare to the generated output for each input set. In an earlier neural network backgammon program Tesauro actually estimated the probability of winning for many intermediate backgammon board positions, and manually trained the network. For TD-Gammon he instead used the method of temporal difference. This method works much like a mathematical induction proof in reverse. The outputs on turn t+1 are used as the correct values to train the network for turn t. At the end of the game we know the desired output values, and these values ripple back through time during the training.

More recently there has been some question about the contribution of TD learning to the performance of TD-Gammon [Pollack and Blair, 1996]. Tesauro does discuss the unique nature of the game of backgammon and why it is more suitable to neural network learning and evaluation. Major factors include the use of dice to distribute play over the fitness landscape, the one-dimensional nature of the board, and the fact that all games eventually terminate with a win for one player. Pollack and Blair assert that starting from random weights, simple co-evolution between a current champion and a slightly mutated challenger can produce results comparable to TD-Gammon.

Update: At IJCAI-01 Jonathan Schaeffer announced he had successfully trained the Chinook checkers evaluation function using temporal difference learning [Schaeffer, Hlynka and Jussila, 2001].

TD-Gammon image based on [Nilsson, 1998]

back index forward